Search CORE

9 research outputs found

Social, Structured and Semantic Search

Author: Bonaque Raphaël
Cautis Bogdan
Goasdoué François
Manolescu Ioana
Publication venue: HAL CCSD
Publication date: 15/03/2016
Field of study

International audienceSocial content such as blogs, tweets, news etc. is a rich source of interconnected information. We identify a set of requirements for the meaningful exploitation of such rich content, and present a new data model, called S3, which is the first to satisfy them. S3 captures social relationships between users, and between users and content, but also the structure present in rich social content, as well as its semantics. We provide the first top-k keyword search algorithm taking into account the social, structured, and semantic dimensions and formally establish its termination and correctness. Experiments on real social networks demonstrate the efficiency and qualitative advantage of our algorithm through the joint exploitation of the social, structured, and semantic dimensions of S3

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Polytechnique

HAL-Rennes 1

Mixed-instance querying: a lightweight integration architecture for data journalism

Author: Bonaque Raphaël
Cao Tien Duc
Cautis Bogdan
Goasdoué François
Letelier Javier
Manolescu Ioana
Mendoza Oscar
Ribeiro Swen
Tannier Xavier
Thomazo Michaël
Publication venue: HAL CCSD
Publication date: 05/09/2016
Field of study

International audienceAs the world's affairs get increasingly more digital, timely production and consumption of news require to efficiently and quickly exploit heterogeneous data sources. Discussions with journalists revealed that content management tools currently at their disposal fall very short of expectations. We demonstrate TATOOINE, a lightweight data integration prototype, which allows to quickly set up integration queries across (very) heterogeneous data sources, capitalizing on the many data links (joins) available in this application domain. Our demonstration is based on scenarios we study in collaboration with Le Monde, France's major newspaper

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Polytechnique

HAL-Rennes 1

Top-k search over rich web content

Author: Bonaque Raphaël
Publication venue
Publication date: 30/09/2016
Field of study

Les réseaux sociaux sont de plus en plus présents dans notre vie de tous les jours et sont en passe de devenir notre moyen de communication et d'information principal. Avec l'augmentation des données qu'ils contiennent sur nous et notre environnement, il devient décisif d'être en mesure d'accéder et d'analyser ces données. Aujourd'hui la manière la plus commune d'accéder à ces données est d'utiliser la recherche par mots-clés : on tape une requête de quelques mots et le réseau social renvoie un nombre fixe de documents qu'il juge pertinents. Dans les approches actuelles de recherche top-k dans un contexte social, la pertinence d'un document dépend de deux facteurs: la proximité sociale entre le document et l'utilisateur faisant la requête et le recoupement entre les mots-clés de la requête et les mots contenus dans le document. Nous trouvons cela limité et proposons de prendre en compte les interactions complexes entres les utilisateurs liés à ce document mais aussi sa structure et le sens des mots qu'il contient, au lieu de leur formulation. Dans ce but, nous identifions les exigences propres à la création d'un modèle qui intégrerait pleinement des données sémantiques, structurées et sociales et proposons un nouveau modèle, S3, satisfaisant ces exigences. Nous rajoutons un modèle de requêtes à S3 et développons S3k, un algorithme personnalisable de recherche top-k par mots-clés sur S3. Nous prouvons la correction de notre algorithme et en proposons une implémentation. Nous la comparons, à l'aide de jeux de données créés à partir du monde réel, avec celle d'une autre approche de recherche top-k par mots-clés dans un contexte social et montrons les différences fondamentales entre ces approches ainsi que les avantages qu'on peut tirer de la nôtre.Social networks are increasingly present in our everyday life and are fast becoming our primary means of information and communication. As they contain more and more data about our surrounding and ourselves, it becomes vital to access and analyze this data. Currently, the primary means to query this data is through top-k keyword search: you enter a few words and the social network service sends you back a fixed number of relevant documents. In current top-k searches in a social context the relevance of a document is evaluated based on two factors: the overlapping of the query keywords with the words of the document and the social proximity between the document and the user making the query. We argue that this is limited and propose to take into account the complex interactions between the users linked to the document, its structure and the meaning of the words it contains instead of their phrasing. To this end we highlight the requirements for a model integrating fully structured, semantic and social data and propose a new model, called S3, satisfying these requirements. We introduce querying capabilities to S3 and develop an algorithm, S3k, for customizable top-k keyword search on S3. We prove the correctness of our algorithm and propose an implementation for it. We compare this implementation with another top-k keyword search in a social context, using datasets created from real world data, and show their differences and the benefits of our approach

Theses.fr

Recherche top-k pour le contenu du Web

Author: Bonaque Raphaël
Publication venue: HAL CCSD
Publication date: 30/09/2016
Field of study

Social networks are increasingly present in our everyday life and are fast becoming our primary means of information and communication. As they contain more and more data about our surrounding and ourselves, it becomes vital to access and analyze this data. Currently, the primary means to query this data is through top-k keyword search: you enter a few words and the social network service sends you back a fixed number of relevant documents. In current top-k searches in a social context the relevance of a document is evaluated based on two factors: the overlapping of the query keywords with the words of the document and the social proximity between the document and the user making the query. We argue that this is limited and propose to take into account the complex interactions between the users linked to the document, its structure and the meaning of the words it contains instead of their phrasing. To this end we highlight the requirements for a model integrating fully structured, semantic and social data and propose a new model, called S3, satisfying these requirements. We introduce querying capabilities to S3 and develop an algorithm, S3k, for customizable top-k keyword search on S3. We prove the correctness of our algorithm and propose an implementation for it. We compare this implementation with another top-k keyword search in a social context, using datasets created from real world data, and show their differences and the benefits of our approach.Les réseaux sociaux sont de plus en plus présents dans notre vie de tous les jours et sont en passe de devenir notre moyen de communication et d'information principal. Avec l'augmentation des données qu'ils contiennent sur nous et notre environnement, il devient décisif d'être en mesure d'accéder et d'analyser ces données. Aujourd'hui la manière la plus commune d'accéder à ces données est d'utiliser la recherche par mots-clés : on tape une requête de quelques mots et le réseau social renvoie un nombre fixe de documents qu'il juge pertinents. Dans les approches actuelles de recherche top-k dans un contexte social, la pertinence d'un document dépend de deux facteurs: la proximité sociale entre le document et l'utilisateur faisant la requête et le recoupement entre les mots-clés de la requête et les mots contenus dans le document. Nous trouvons cela limité et proposons de prendre en compte les interactions complexes entres les utilisateurs liés à ce document mais aussi sa structure et le sens des mots qu'il contient, au lieu de leur formulation. Dans ce but, nous identifions les exigences propres à la création d'un modèle qui intégrerait pleinement des données sémantiques, structurées et sociales et proposons un nouveau modèle, S3, satisfaisant ces exigences. Nous rajoutons un modèle de requêtes à S3 et développons S3k, un algorithme personnalisable de recherche top-k par mots-clés sur S3. Nous prouvons la correction de notre algorithme et en proposons une implémentation. Nous la comparons, à l'aide de jeux de données créés à partir du monde réel, avec celle d'une autre approche de recherche top-k par mots-clés dans un contexte social et montrons les différences fondamentales entre ces approches ainsi que les avantages qu'on peut tirer de la nôtre

HAL-CentraleSupelec

Thèses en Ligne

HAL-Rennes 1

Recherche sur du contenu structuré, social et sémantique

Author: Bonaque Raphaël
Cautis Bogdan
Goasdoué François
Manolescu Ioana
Publication venue: HAL CCSD
Publication date: 01/10/2015
Field of study

Social content such as blogs, tweets, news etc. is a rich source of interconnected information. We identify a set of requirements for the meaningful exploitation of such rich content, and present a new data model, called S3, which is the first to satisfy them. S3 captures social relationships between users, and between users and content, but also the structure present in rich social content, as well as its semantics. We provide the first top-k keyword search algorithm taking into account the social, structured, and semantic dimensions and formally establish its termination and correctness. Experiments on real social networks demonstrate the efficiency and qualitative advantage of our algorithm through the joint exploitation of the social, structured, and semantic dimensions of S3.Les contenus sociaux comme les blogs, les tweets, les journaux en ligne etc. sont une source riche d’informations liées. Nous identifions dans ce rapport un ensemble de conditions nécessaires à une exploration pertinente de ce contenu riche et introduisons un nouvel modèle de données, S3, qui est le premier à les satisfaire. S3 capte les relations sociales entre les utilisateurs et les contenus mais aussi la structure et la sémantique de ces derniers. Nous proposons aussi le premier algorithme de recherche top k qui prend en compte les dimensions structurelles, sociales et sémantiques et donnons une preuve formelle de sa correction et de sa terminaison. Une évaluation expérimentale sur des vrais réseaux sociaux valide l’efficacité et la qualité de notre approche sur l’exploration conjointe des dimensions structurelles, sociales et sémantiques de S3

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Toward Social, Structured and Semantic Search

Author: Bonaque Raphaël
Cautis Bogdan
Goasdoué François
Manolescu Ioana
Publication venue: HAL CCSD
Publication date: 19/10/2014
Field of study

International audienceSocial content such as social network posts, tweets, news articles and more generally web page fragments is often structured. Such social content is also frequently enriched with annotations, most of which carry semantics, either by collaborative effort or from automatic tools. Searching for relevant informa-tion in this context is both a basic feature for the users and a challenging task. We present a data model and a preliminary approach for answering queries over such structured, social and semantic-rich content, taking into account all dimensions of the data in order to return the most meaningful results

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Recherche Sociale, Structurée et Sémantique

Author: Bonaque Raphaël
Cautis Bogdan
Goasdoué François
Manolescu Ioana
Publication venue: HAL CCSD
Publication date: 01/11/2016
Field of study

National audienceSocial content such as blogs, tweets, news etc. is a rich source of interconnected information. We identify a set of requirements for the meaningful exploitation of such rich content, and present a new data model, called S4, which is the first to satisfy them. S4 captures social relationships between users, and between users and content, but also the structure present in rich social content, as well as its semantics. We show how S4 instances are derived from content and relationships present in today's social media, and provide the first top-kB keyword search algorithm taking into account the social, structured, and semantic dimensions and formally establish its termination and correctness.Experiments on real social networks demonstrate the efficiency and qualitative advantage of our algorithm through the joint exploitation of the social, structured, and semantic dimensions of S4

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Polytechnique

HAL-Rennes 1